Compressing Inverted Index Using Optimal FastPFOR

نویسندگان

  • Veluchamy Glory
  • Sandanam Domnic
چکیده

Indexing plays an important role for storing and retrieving the data in Information Retrieval System (IRS). Inverted Index is the most frequently used indexing structure in IRS. In order to reduce the size of the index and retrieve the data efficiently, compression schemes are used, because the retrieval of compressed data is faster than uncompressed data. High speed compression schemes can improve the performance of IRS. In this paper, we have studied and analyzed various compression techniques for 32-bit integer sequences. The previously proposed compression schemes achieved either better compression rates or fast decoding, hence their decompression speed (disk access + decoding) might not be better. In this paper, we propose a new compression technique, called Optimal FastPFOR, based on FastPFOR. The proposed method uses better integer representation and storage structure for compressing inverted index to improve the decompression performance. We have used TREC data collection in our experiments and the results show that the proposed code could achieve better compression and decompression compared to FastPFOR and other existing related compression techniques.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Partitioning Inverted Lists for Efficient Evaluation of Set-Containment Joins in Main Memory

We present an algorithm for efficient processing of set-containment joins in main memory. Our algorithm uses an index structure based on inverted files. We focus on improving performance of the algorithm in a main-memory environment by utilizing the L2 CPU cache more efficiently. To achieve this, we employ some optimizations including partitioning the inverted lists and compressing the intermed...

متن کامل

An Asymptotically Optimal Data Compression Algorithm Based on an Inverted Index

The usual method of representing a data sequence drawn from a nite alphabet associates with each location in the sequence, the source letter that appears there. An alternate approach is to associate with each source letter, the list of locations at which it appears in the data sequence [1]. We present a data compression algorithm based on a generalization of this idea. The algorithm parses the ...

متن کامل

A New Compression Based Index Structure for Efficient Information Retrieval

Finding desired information from large data set is a difficult problem. Information retrieval is concerned with the structure, analysis, organization, storage, searching, and retrieval of information. Index is the main constituent of an IR system. Now a day exponential growth of information makes the index structure large enough affecting the IR system’s quality. So compressing the Index struct...

متن کامل

Inverted Index Compression

The data structure at the core of nowadays large-scale search engines, social networks and storage architectures is the inverted index, which can be regarded as being a collection of sorted integer sequences called inverted lists. Because of the many documents indexed by search engines and stringent performance requirements dictated by the heavy load of user queries, the inverted lists often st...

متن کامل

Fast Arabic Query Matching for Compressed Arabic Inverted Indices

Information retrieval systems and Web search engines apply highly optimized techniques for compressing inverted indices. These techniques reduce index sizes and improve the performance of query processing that uses compressed indices to find relevant documents for the users' queries. In this paper, we proposed a novel technique for querying compressed Arabic inverted indices in search engines. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • JIP

دوره 23  شماره 

صفحات  -

تاریخ انتشار 2015